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Abstract. Most specification languages express only qualitative con- 
straints. However, among two implementations that satisfy a given spec- 
ification, one may be preferred to another. For example, if a specifica- 
tion asks that every request is followed by a response, one may prefer 
an implementation that generates responses quickly but does not gen- 
erate unnecessary responses. We use quantitative properties to measure 
the "goodness" of an implementation. Using games with corresponding 
quantitative objectives, we can synthesize "optimal" implementations, 
which are preferred among the set of possible implementations that sat- 
isfy a given specification. 

In particular, we show how automata with lexicographic mean-payoff 
conditions can be used to express many interesting quantitative proper- 
ties for reactive systems. In this framework, the synthesis of optimal im- 
plementations requires the solution of lexicographic mean-payoff games 
(for safety requirements), and the solution of games with both lexico- 
graphic mean-payoff and parity objectives (for liveness requirements). 
We present algorithms for solving both kinds of novel graph games. 



1 Introduction 

Traditional specifications are Boolean: an implementation satisfies a specifica- 
tion, or it does not. This Manichean view is not entirely satisfactory: There are 
usually many different ways to satisfy a specification, and we may prefer one im- 
plementation over another. This is especially important when we automatically 
synthesize implementations from a specification, because we have no other way to 
enforce these preferences. In this paper, we add a quantitative aspect to system 
specification, imposing a preference order on the implementations that satisfy 
the qualitative part of the specification. Then, we present synthesis algorithms 
that construct, from a given specification with both qualitative and quantitative 
aspects, an implementation that (i) satisfies the qualitative aspect and (ii) is 
optimal or near-optimal with respect to the quantitative aspect. Along the way, 
we introduce and solve graph games with new kinds of objectives, namely, lexico- 
graphic mean-payoff objectives and the combination of parity and lexicographic 
mean-payoff objectives. 

Suppose we want to specify an arbiter for a shared resource. For each 
client z, the arbiter has an input (request access) and an output gi (ac- 
cess granted). A first attempt at a specification in LTL may be /\^G{ri 
F gi) AG /\- /\j^i{^gi'^^gj)- (AH requests are granted eventually and two grants 



never occur simultaneously.) This specification is too weak: An implementation 
that raises all gi signals in a round-robin fashion satisfies the specification but 
is probably undesired. The unwanted behaviors can be ruled out by adding the 
requirements /\- Q>{gi — s- X{-^gi\N Vi)) A /\j ^giWri. (No second grant before a 
request.) 

Such Boolean requirements to rule out trivial but undesirable implementa- 
tions have several drawbacks: (i) they are easy to forget and difficult to get right 
(often leading to unrealizable specifications) and, perhaps more importantly, 
(ii) they constrain implementations unnecessarily, by giving up the abstract 
quality of a clean specification. In our example, we would rather say that the 
implementation should produce "as few unnecessary grants as possible" (where 
a grant gt is unnecessary if there is no outstanding request n). We will add a 
quantitative aspect to specifications which allows us to say that. Specifically, 
we will assign a real- valued reward to each behavior, and the more unnecessary 
grants, the lower the reward. 

A second reason that the arbiter specification may give rise to undesirable 
implementations is that it may wait arbitrarily long before producing a grant. 
Requiring that grants come within a fixed number of steps instead of "eventually" 
is not robust, because it depends on the step size of the implementation and the 
number of clients. Rather, we assign a lower reward to executions with larger 
distances between a request and corresponding grant. If we use rewards both 
for punishing unnecessary grants and for punishing late grants, then these two 
rewards need to be combined. This leads us to consider tuples of costs that are 
ordered lexicographically. We define the quantitative aspect of a specification 
using lexicographic mean-payoff automata^ which assign a tuple of costs to each 
transition. The cost of an infinite run is obtained by taking, for each component 
of the tuple, the long-run average of all transition costs. Such automata can 
be used to specify both "produce as few unnecessary grants as possible" and 
"produce grants as quickly as possible," and combinations thereof. 

If the qualitative aspect of the specification is a safety property, then syn- 
thesis requires the solution of lexicographic mean-payoff games, for which we can 
synthesize optimal solutions. (The objective is to minimize the cost of an infinite 
run lexicographically.) If the qualitative aspect is a liveness property, then we 
obtain lexicographic mean-payoff parity games, which must additionally satisfy a 
parity objective. We present the solution of these games in this paper. We show 
that lexicographic mean-payoff games are determined for memoryless strategies 
and can be decided in NP n coNP, but that in general optimal strategies for 
lexicographic mean-payoff parity games require infinite memory. We prove, how- 
ever, that for any given real vector £ > 0, there exists a finite-state strategy 
that ensures a value within e of the optimal value. This allows us to synthe- 
size e-optimal implementations, for any e. The complexity class of the optimal 
synthesis problem is NP. 

Related work. There are several formalisms for quantitative specifications in the 
literature [2,4-7, 10, 11, 14, 15, 19]; most of these works (other than [2, 7, 10]) do 
not consider mean-payoff specifications and none of these works focus on how 



quantitative specifications can be used to obtain better implementations for the 
synthesis problem. Several notions of metrics have been proposed in the literature 
for probabilistic systems and games [12,13]; these metrics provide a measure 
that indicates how close are two systems with respect to all temporal properties 
expressible in a logic; whereas our work compares how good an implementation 
is with respect to a given specification. The work [9] considers non-zero-sum 
games with lexicographic ordering on the payoff profiles, but to the best of our 
knowledge, the lexicographic quantitative objective we consider for games has 
not been studied before. 

2 Examples 

After giving necessary definitions, we illustrate with several examples how quan- 
titative constraints can be a useful measure for the quality of an implementation. 
Alphabets, vectors, and lexicographic order. Let X and O be finite sets of 
input and output signals, respectively. We define the input alphabet Uj = 2-^ and 
the output alphabet Sq = 2*^. The joint alphabet S is defined as 17 = 2'^^'-'. Let 
be the set of real vectors of dimension d with the usual lexicographic order. 
Mealy machines. A Mealy machine is a tuple M = {Q,qo,S), where Q is 
a finite set of states, G Q is the initial state, and 6 C Q x Uj x Sq x Q 
is a set of labeled edges. We require that the machine is input enabled and 
deterministic: \/q G Q .Vi G Sj, there exists a unique o G So a-nd a unique 
q' & Q such that (g, i, o, q') G 5. Each input word i = i^ii • • • G Si^ has a unique 
run qoioooqiiioi . . . such that Vfc > .{qk,ik, Ok, qk+i) G S. The corresponding 
// word is iQ U oq, ii U oi , • ■ • e S'^ . The language of M, denoted by Lm-, is the 
set of all I/O words of the machine. Given a language L C Z"", we say a Mealy 
machine M implements L if C L. 

Quantitative languages. A quantitative language [7] over is a function 
L : Z" V that associates to each word in S'^ a value from V, where 1/ C K'^ 
has a least element. Words with a higher value are more desirable than those 
with a lower value. In the remainder, we view an ordinary, qualitative language 
as a quantitative language that maps words in L to true (= 1) and words not in 
L to false (= 0). We often use a pair {L, L') of a qualitative language L and a 
quantitative language L' : U'^ V as specification, where L has higher priority 
than L' . We can also view {L, L') as quantitative language with (L, L'){w) = 
if L{w) = 0, and (L, L')(w) = L'(w) —v± + l otherwise, where v± is the minimal 
value in V. (Adding constant factors does not change the order between words). 

We extend the definition of value to Mealy machines. As in verification and 
synthesis of qualitative languages, we take the worst-case behavior of the Mealy 
machine as a measure. Given a quantitative language L over the value of a 
Mealy machine M, denoted by L{M), is mf^j^LM L{w). 

Lexicographic mean-payofF automata. We use lexicographic mean-payoff 
automata to describe quantitative languages. In lexicographic mean-payoff au- 
tomata each edge is mapped to a reward. The automaton associates a run with 
a word and assigns to the word the average reward of the edges taken (as in 
mean-payoff games [16]). Unlike in mean-payoff games, rewards are vectors. 
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Fig. 1. Three Mealy machines that imple- 
ment G(r g \/ Xg) 




Fig. 2. Two specifications that provide 
different ways of charging for grants. 



Formally, a lexicographic mean-payoff automaton of dimension d over S is 
a tuple A = {{S, sq, E),r), where S* is a set of states, E'CS'xZ'xS'isa 
labeled set of edges, sq S 5 is the initial state, and r ; E' ^ N'^ is a reward 
function that maps edges to d-vectors of natural numbers. Note that all rewards 
are non-negative. We assume that the automaton is complete and deterministic: 
for each s and a there is exactly one s' such that (s, cr, s') Cz E. A word w = 
wqWi • • • e Z!'^ has a unique run p{w) = soeosiei . . . such that Si £ S and 
Si = (s,;, Wi, Si+i) £ E for all i > 0. The lexicographic mean payoff LM{p) of a 
run p is defined as LM{p) = liminf„^oo 7^ Tll^o '"(^i)- The automaton defines a 
quantitative language with domain by associating to every word w the value 
La{w) = LM{p{w)). 

If the dimension of A is 1 and the range of La is {0, 1} then, per definition. 
La defines a qualitative language. We say that A is a safety automaton if it 
defines a qualitative language and there is no path from an edge with reward 
to an edge with reward > 0. Safety automata define safety languages [1]. Note 
that in general, tj-regular languages and languages expressible with mean-payoff 
automata are incomparable [7]. 

Example 1. Let us consider a specification of an arbiter with one client. In 
the following, we use r, f, g, and g to represent that r or 5 arc set to true and 
false, respectively and T to indicate that a signal can take either value. A slash 
separates input and output. 

Take the specification Lp — G(r g V Xg): every request is granted within 
two steps. The corresponding language maps a word w ~ wqwi, ... to true 
iff for every position i in w, if r S wt^ then g G Wi U Wi+i. Fig. 1 shows three 
implementations for L,^. Machine Mi asserts g continuously independent of r, 
M2 responds to each request with a grant but keeps g low otherwise, and M3 
delays its response if possible. 

We use a quantitative specification to state that we prefer an implementation 
that avoids unnecessary grants. Fig. 2 shows two mean-payoff automata, Ai and 
A2 that define rewards for the behavior of an implementation. Note that we 
have summarized edges using Boolean algebra. For instance, an arc labeled g 
in the figure corresponds to the edges labeled rg and fg. Automata Ai and A2 
define quantitative languages that distinguish words by the frequency of grants 
and the condition under which they appear. Specification Ai defines a reward 
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Fig. 3. A specification that rewards quick grants for a request from Client 1, and a 
specification that rewards quick grants for both clients, while giving priority to Client 1. 

of 1 except when a grant is given; A2 only withholds a reward when a grant is 
given unnecessarily. Consider the words Wi = {rg,rg)^ and W2 = {rg,fg,rg)'^ . 
Specification Ai defines the rewards L^^ (wi) = 1/2, and Lai(u'2) = 1/3. For A2, 
we get La2{wi) = 1 and L^2('^2) =2/3. Both specifications are meaningful but 
they express different preferences, which leads to different results for verification 
and synthesis, as discussed in Section 4. 

Recall the three implementations in Fig. 1. Each of them implements Lip. 
For Ai, input r'^ gives the lowest reward. The values corresponding to the in- 
put/output word of Ml, M2, and M3 are 0, 0, and 1/2, respectively. Thus, Ai 
prefers the last implementation. The values of the implementations for A2 are 
minimal when the input is f"; they are 0, 1, and 1, respectively. Thus, A2 prefers 
the last two implementations, but does not distinguish between them. 

Example 2. Assume we want to specify an arbiter for two clients that answers 
requests within three steps. Simultaneous grants are forbidden. Formally, we 
have if = Aie{i,2} '^{'^^ ^ Vtg{o.i,2} ^* -90 ^ V ^^2)- We want grants to 

come as quickly as possible. Fig. 3 shows a mean-payoff automaton A3 that 
rewards rapid replies to Client 1. Suppose we want to do the same for Client 2. 
One option is to construct a similar automaton A'^ for Client 2 and to add 
the two resulting quantitative languages. This results in a quantitative language 
+ La'^ that treats the clients equally. Suppose instead that we want to give 
Client 1 priority. In that case, we can construct a lexicographic mean-payoff 
automaton that maps a word u; to a tuple {si{w), S2{w)), where the first and 
second elements correspond to the payoff for Clients 1 and 2, resp. Part of this 
automaton. A4, is shown in Fig. 3. 

Automaton A3 distinguishes words with respect to the maximal average 
distance between request and grant. For instance, LAgiirigi^figi)^) = 1 and 
Lyi3 ((ri^i, fi5i)") = 1/2. Automaton A4 associates a vector to every word. For 
instance, L^,((rigir2.g2, ^1317=252)") = 1/2 • ((1,0) + (1,1)) = (1,1/2), which 
makes it preferable to the word (?'igir2P2, '^151^252)", which has value (1/2, 1). 
This is what wc expect: the first word gives priority to requests from Client 1, 
while the second gives priority to Client 2. 



Example 3. Let us consider the liveness specification ip ~ G{r ^ f g) saying 
that every request must be granted eventuaUy. This languages can usefully be 
combined with A3, stating that grants must come quickly. It can also be com- 
bined with Ai from Fig. 2 stating that grants should occur as infrequently as 
possible. A Mealy machine may emit a grant every k ticks, which gives a reward 
of 1 — 1/fc. Thus, there is an infinite chain of ever-better machines. There is 
no Mealy machine, however, that obtains the limit reward of 1. This can only 
be achieved by an implementation with infinite memory, for instance one that 
answers requests only in cycle 2*"' for all k [8]. 

3 Lexicographic Mean-Payoff (Parity) Games 

We show how to solve lexicographic mean-payoff games and lexicographic mean- 
payoff parity games, which we will need in Section 4 to solve the synthesis prob- 
lem for quantitative specifications. 

3.1 Notation and known results 

Game graphs and plays. A game graph over the alphabet is a tuple 
G = {S,so,E) consisting of a finite set of states 5, partitioned into S'l and 
5*2, representing the states of Player 1 and Player 2, an initial state sq G S, 
and a finite set of labeled edges E C S x E x S. We require that the la- 
beling is deterministic, i.e., if {s,(T,t), {s,a,t') E E, then t = t' . We write 
E = {{s,t) \ 3a € S : {s,a,t) € E}. At S'l states. Player 1 decides the suc- 
cessor state and at S2 states. Player 2 decides the successor states. We assume 
that Ws E S .3t & S .{s,t) E E. A play p = poPi • • • G 5"^ is an infinite sequence 
of states such that for alH > we have (pi, pi+i) E E. We denote the set of all 
plays by Q. 

The labels and the initial state arc not relevant in this section. They are 
used later to establish the connection between specifications, games, and Mealy 
machines. They also allow us to view automata as games with a single player. 

Strategies. Given a game graph G = {S,sq,E), a strategy for Player 1 is a 
function tti : S* Si S such that Vsq ■ ■ ■ Si E S* Si we have (s^, 7ri(soSi . . . s^)) E 
E. A Player-2 strategy is defined similarly. We denote the set of all Player- 
p strategies by Up. The outcome p(7ri,7r2,s) of tti and 7:2 on G starting at 
s is the unique play p = popi . . . such that for all i > 0, if pi E Sp, then 
Pt+i = 7rp(po ■■■Pt) and po = s. 

A strategy iTp E Up is memoryless if for any two sequences a ^ sq . . . Si E 
S*Sp and a' = Sg . . . s^, E S*Sp such that s,; = s^,, we have TTp{a) = iTp^a'). 
We represent a memoryless strategy iTp simply as a function from Sp to S. A 
strategy is a finite-memory strategy if it needs only finite memory of the past, 
consisting of a finite-state machine that keeps track of the history of the play. The 
strategy chooses a move depending on the state of the machine and the location 
in the game. Strategies that are not finite-memory are called infinite-memory 
strategies. 



Quantitative and qualitative objectives. We consider different objectives 
for the players. A quantitative objective / is a function / : ^ R'' that assigns 
a vector of reals as reward to every play. We consider complementary objectives 
for the two players; i.e., if the objective for Player 1 is /, then the objective for 
Player 2 is — /. The goal of each player is to maximize her objective. Note that 
Player 2 tries to minimize / by maximizing the complementary — /. An objective 
f : fi ^ {0,±1} that maps to the set {0,1} (or {0,-1}) is per definition a 
qualitative objective. Given a qualitative objective f : [2 ^ V we say a play 
p G Q is winning for Player 1 if f{p) = max(l/) holds, otherwise the play is 
winning for Player 2. 

Value. Given an objective /, the Player-1 value of a state s for a strategy tti is 
the minimal value Player 1 achieves using tti against all Player-2 strategies, i.e., 
Vi(/, s, TTi) = inf7r2e-n'2 f{p{T^ii "'2, s)). The Player-1 value of a state s is the max- 
imal value Player-1 can ensure from state s, i.e., Vi(/, s) = sup^^g^j^ Vi(/, s, tti). 
Player-2 values are defined analogously. If Vi(/, s) -I- V2(— /, s) = for all s, then 
the game is determined and we call Vi(/, s) the value of s. 
Optimal, e-optimal, and winning strategies. Given an objective / and 
a vector e > 0, a Player-1 strategy tti is Player-1 e-optimal from a state s if 
Vi(/, s,7ri) > Vi(/, s) — £. If TTi is 0-optimal from s, then we call tti optimal 
from s. Optimality for Player-2 strategies is defined analogously. If / : /2 ^ 
y is a qualitative objective, a strategy tti is winning for Player 1 from s if 
Vi(/, s,7ri) = max(T/). 

We now define various objectives. 
Parity objectives. A parity objective consists of a priority function p : S ^ 
{0, 1, . . . , fc} that maps every state in 5 to a number (called priority) between 
and fc. We denote by \p\ the maximal priority (i.e., \p\ = k). The objective 
function P of Player 1 maps a play p to 1 if the smallest priority visited infinitely 
often is even, otherwise p is mapped to 0. 

Lexicographic mean-payoff objectives. A lexicographic mean-payoff ob- 
jective consists of a reward function r : E ^ N"^ that maps every edge 
in G to a d- vector (called reward) of natural numbers. We define |r| = 
ni<i<d ™^^^ee£^ ^i(^)' where ri{e) is the i-component of r(e). The objective 
function of Player 1 for a play p is the lexicographic mean payoff LMr{p) = 
liminf„^oo ^ Z)"=o ''(('°»' ft+i))- K = 1, then LM^ip) is the mean payoff [16] 
and we refer to it as Mj.{p). 

Lexicographic mean-payoff parity objectives. A lexicographic mean-payoff 
parity objective has a priority function p : S* — > {0, 1, . . . , fc} and a reward func- 
tion r : E —> N'^. The lexicographic mean-payoff parity value LMP^ip) for 
Player 1 of a play p is the lexicographic mean-payoff LMr{p) if p is winning 
for the parity objective (i.e., Pp{p) = 1), else the payoff is —1. If c? = 1, then 
LMPr,p{p) defines the mean-payoff parity value [8] and we write MPr,p{p)- If p 
or r are clear from the context, we omit them. 

Games and automata. A game is a tuple Q — {G, /) consisting of a game 
graph G = {S, sq, E) and an objective /. An automaton is a game with only one 
player, i.e., S = Si. We name games and automata after their objectives. 



3.2 Lexicographic mean- payoff games 

In this section, wc prove that mcmorylcss strategics arc sufficient for lexico- 
graphic mean-payoff games and we present an algorithm to decide these games 
by a reduction to simple mean-payoff games. We first present the solution of 
lexicographic mean-payoff games with a reward function with two components, 
and then extend it to d-dimensional reward functions. Consider a lexicographic 
mean-payoff game Qlm = {{S, sq, E),r), where r = {ri,r2) consists of two re- 
ward functions. 

Memoryless strategies suffice. We show that memoryless strategies suffice 
by a reduction to a finite cycle forming game. Let us assume we have solved 
the mean-payoff game with respect to the reward function ri . Consider a value 
class of ri, i.e., a set of states having the same value with respect to ri. It is 
not possible for Player 1 to move to a higher value class, and Player 1 will never 
choose an edge to a lower value class. Similarly, Player 2 does not have edges to 
a lower value class and will never choose edges to a higher value class. Thus, we 
can consider the sub-game for a value class. 

Consider a value class of value £ and the sub-game induced by the value 
class. We now play the following finite-cycle forming game: Player 1 and Player 2 
choose edges until a cycle C is formed. The payoff for the game is as follows: (1) 
If the mean-payoff value of the cycle C for ri is greater than £, then Player 1 
gets reward |7'2| + 1. (2) If the mean-payoff value of the cycle C for ri is smaller 
than £, then Player 1 gets reward —1. (3) If the mean-payoff value of the cycle 
C for ri is exactly £, then Player 1 gets the mean-payoff value for reward r2 of 
the cycle C. 

Lemma 1. The value of Player 1 for any state in the finite-cycle forming game 
is (i) strictly greater than —1 and (m) strictly less than \r2\ + 1. 
Lemma 2. Both players have memoryless optimal strategy in the finite-cycle 
forming game. 

Proof. The result can be obtained from the result of Bjorklund et al. [3]. From 
Theorem 5.1 and the comment in Section 7.2 it follows that in any finite-cycle 
forming game in which the outcome depends only on the vertices that appear in 
the cycle (modulo cyclic permutations) we have that memoryless optimal strate- 
gies exist for both players. Our finite-cycle forming game satisfies the conditions. 

□ 

Lemma 3. The following assertions hold. 

1. If the value of the finite-cycle forming game is /3 at a state s, then the value 
of the lexicographic mean-payoff game is (£, /S) at s. 

2. A memoryless optimal strategy of the finite-cycle forming game is optimal 
for the lexicographic mean-payoff game. 

Proof. The proof has the following two parts. 

1 . Fix a memoryless optimal strategy tti for Player 1 for the finite-cycle forming 
game: such a strategy exists by Lemma 2. Observe that by Lemma 1 we 
have /3 > — 1. In the resulting graph, for any cycle C reachable from s, the 
following assertions hold (because of optimality of tti): 



(a) Property 1: the mean-payofF reward for r2 is at least (3 and the mean- 
payoff reward for ri is at least t, or 

(b) Property 2: the mean-payoff reward for ri is at least t > I (i.e., the 
reward is strictly greater than I and in this case the mean-payoff reward 
for r-i can be less than 

Consider any strategy ixi for Player 2 and the path p = p(s,7ri,7r2) and 
consider any prefix of length n of p. The prefix can be decomposed as a finite- 
prefix of length at most \S\, then cycles in the graph, and then a trailing 
prefix of length at most \S\. For a prefix of length n, let us denote by Ji(n) 
the sum total of the steps of cycles in the prefix that satisfies property 1, 
and by J2(n) the sum total of the steps of cycles in the prefix that satisfies 
property 2. It follows that hm„_oo > lim„_oo = 1. It 

follows that for any n we have 

n 

(a) ^ri((p„p,+i)) > ,h{n)-l + J2{n)-I^2-\S\ ■ \R\- 

i=0 

and 

n 

(b) ^ r2((p„p,+i)) > Ji(n) • /? - (2 • 1^1 + J2(n)) ■ \R\; 

i=0 

where \R\ = TOaxUri |, |r2|}. Let liminf„_>oo ■^^^ = i^- The following two 
case analysis completes the result. 

(a) If K > 0, then we have 

1 " 

lim inf -y^ri{{pi,p^+i))>{l-K)-£ + K-£^£ + K- {£-£)>£. 

i=0 

(b) If K = 0, then we have 

1 " 

lim inf - r'i((pi, p^+i)) > 

n — 'oo ri — 

i=0 

and 



lim inf iVr2((p„p.+i))>lim inf M^_2.:M!^.|i?|_2.M 



\R\ 



It follows that LM{p) > {£,(3). 

Fix a memoryless optimal strategy 7r2 for Player 2 for the finite-cycle forming 
gameObserve that by Lemma 1 we have /? < |r2| + 1. Then in the resulting 
graph, for any cycle C reachable from s, the following properties hold due 
to optimality of 1^2'. 

(a) Property 1. the mean-payoff reward for ri is at most £ and the mean- 
payoff reward for r2 is at most /3; or 

(b) Property 2. the mean-payoff reward for 7'2 is at most £ < £ (the mean- 
payoff reward for ri is strictly smaller than £). 

By analysis similar to the previous case, it follows that the lexicographic 
mean- payoff value is at most {£,[3). □ 



Reduction to mean-payoff games. We now sketch a reduction of lexico- 
grapliic mean-payoff games to mean-payoff games for optimal strategies. We 
reduce the reward function r = (ri,r2) to a single reward function r* . We en- 
sure that if the mean-payoff difference of two cycles Ci and C2 for reward ri is 
positive, then the difference in reward assigned by r* exceeds the largest possible 
difference in the mean-payoff for reward r2. Consider two cycles Ci of length rii 
and C2 of length ri,2, such that the sum of the ri rewards of C,; is Oj. Since all 
rewards are integral, 1^ — 22.1 > q implies I Hi _ ££2.1 > — i — _ Hence wc multiply 
the ri rewards by m = IS'p • |r2 1 -t- 1 to obtain r* = m-ri +r2. This ensures that if 
the mean-payoff difference of two cycles Ci and C2 for reward ri is positive, then 
the difference exceeds the difference in the mean-payoff for reward r2. Observe 
that we restrict our attention to cycles only since we have already proven that 
optimal memory less strategies exist. 

We can easily extend this reduction to reduce lexicographic mean-payoff 
games with arbitrarily many reward functions to mean-payoff games. The fol- 
lowing theorem follows from this reduction in combination with known results 
for mean payoff parity games [16, 20]. 

Theorem 1 (Lexicographic mean- payoff games). For all lexicographic 
mean-payoff games Qlm = {{S, sq, E),r) , the following assertions hold. 

1. (Determinacy) For all states s e S, Vi{LMP, s) + V2{-LMP, s) = 0. 

2. (Memoryless optimal strategies.) Both players have memoryless optimal 
strategies from every state s E S. 

3. (Complexity). Whether the lexicographic mean-payoff value vector at a state 
s G S is at least a rational value vector v can be decided in NP n coNP. 

4- (Algorithms). The lexicographic mean-payoff value vector for all states can 
be computed in time 0(|5|^''+'^ ■ \E\ ■ \r\). 

3.3 Lexicographic Mean-Payoff Parity Games 

Lexicographic mean-payoff parity games arc a natural lexicographic extension of 
mean-payoff parity games [8] . The algorithmic solution for mean-payoff parity 
games is a recursive algorithm, where each recursive step requires the solution of 
a parity objective and a mean-payoff objective. The key correctness argument of 
the algorithm relies on the existence of memoryless optimal strategies for parity 
and mean-payoff objectives. Since memoryless optimal strategies exist for lexi- 
cographic mean-payoff games, the solution of mean-payoff parity games extends 
to lexicographic mean-payoff parity games: in each recursive step, we replace the 
mean-payoff objective by a lexicographic mean-payoff objective. Thus, we have 
the following result. 

Theorem 2 (Lexicographic mean-payoff parity games). For all lexico- 
graphic mean-payoff parity games Glmp = {{S, sq, E),r,p), the following asser- 
tions hold. 

1. (Determinacy). Vi{LMP, s) + V2{-LMP, s) = for all state s& S. 

2. (Optimal strategies). Optimal strategies for Player 1 exist but may require 
infinite memory; finite-memory optimal strategies exist for Player 2. 




Fig. 4. Game in which the optimal strategy requires infinite memory. 

3. (Complexity). Whether the value at a state s ^ S is at least the vector v can 
be decided in coNP. 

4- (Algorithms). The value for all states can be computed in time 0(|S'|IpI • 
(min{|5|* • + l^p-i+a • \E\ ■ \r\)) . 

In the following, we prove two properties of mean-payoff parity games that are 
interesting for synthesis. For simplicity, we present the results for mean-payoff 
parity games. The results extend to lexicographic mean-payoff parity games as in 
Theorem 2. First, we show that the algorithm of [8] can be adapted to compute 
finite-memory strategies that are e-optimal. Then, we show that Player 1 has 
a finite-memory optimal strategy if and only if she has a memoryless optimal 
strategy. 

Finite-memory e-optimal strategy. In mean-payoff parity games, though 
optimal strategies require infinite memory for Player 1, there is a finite-memory 
e-optimal strategy for every e > 0. The proof of this claim is obtained by a 
more detailed analysis of the optimal strategy construction of [8]. The optimal 
strategy constructed in [8] for Player 1 can be intuitively described as follows. 
The strategy is played in rounds, and each round has three phases: (a) playing 
a memoryless optimal mean-payoff strategy; (b) playing a strategy in a sub- 
game; (c) playing a memoryless attractor strategy to reach a desired priority. 
Then the strategy proceeds to the next round. The length of the first phase 
is monotonically increasing in the number of rounds, and it requires infinite 
memory to count the rounds. Given an e > 0, we can fix a bound on the number 
of steps in the first phase that ensures a payoff within e of the optimal value. 
Hence, a finite-memory strategy can be obtained. 

We illustrate the idea with an example. Consider the example shown in Fig. 4 
where we have a game graph where all states belong to Player 1. The goal of 
Player 1 is to maximize the mean-payoff while ensuring that state si is visited 
infinitely often. An optimal strategy is as follows: the game starts in round 1. In 
each round i, the edge sq sq is chosen i times, then the edge sq — > si is chosen 
once, and then the game proceeds to round i -I- 1. Any optimal strategy in the 
game shown requires infinite memory. However, given e > 0, in every round the 
edge So sq can be chosen a fixed number K times such that K > ^ ~2. Then 
the payoff is ^°^_+^° = 10 - > 10 - e (since K > ^-2); which is within e 
of the value. It may also be noted that given e > 0, the finite-memory optimal 
strategy can be obtained as follows. We apply the recursive algorithm to solve 
the game to obtain two memoryless strategies: one for the mean-payoff strategy 
and other for the attractor strategy. We then specify the bound (depending on 



e) on the number of steps for the mean-payoff strategy for each phase (this 
requires an additional O(^) time for the strategy description after the recursive 
algorithm) . 

Theorem 3. For all lexicographic mean-payoff parity games and for all e > 0, 
there exists a finite-memory s-optimal strategy for Player 1. Given e > 0, a finite- 
memory e-optimal strategy can be constructed in time 0{\S\^p^ ■ |£'p''+^ • \r\ + i). 

Optimal finite-memory and memoryless strategies. Consider a mean- 
payoff parity game Q = {{S,so, E), r, p) . Our goal is to show that if there is a 
finite- memory optimal strategy for Player 1 , then there is a memoryless optimal 
strategy for Player 1. Suppose there is a finite- memory optimal strategy tti for 
Player 1. Consider the finite graph Q obtained by fixing the strategy tti. {Q is 
obtained as the synchronous product of the given game graph and finite-state 
strategy automaton for tti.) For a state s £ 5, consider any cycle C in Q that is 
reachable from (s, (70) (where go is the initial memory location) and C is executed 
to ensure that Player 1 does not achieve a payoff greater than the value of the 
game from s. We denote by C\g the sequence of states in Q that appear in C. 
We call a cycle C oiQ that appears in C|g a component cycle of C. We have the 
following properties about the cycle C and its component cycles. 

1. min(p(C|g)) is even. 

2. Suppose there is a component cycle C of C such that the average of the 
rewards of C is greater than the average of the rewards of C . If Player 2 
fixes a finite-memory strategy that corresponds to the execution of cycle C, 
then an infinite-memory strategy can be played by Player 1 that pumps the 
cycle C longer and longer to ensure a payoff that is equal to the average of 
the weights of C . The infinite memory strategy ensures that all states in C\g 
are visited infinitely often, but the long-run average of the rewards is the 
average of the rewards of C . This would imply that for the cycle C, Player 1 
can switch to an infinite-memory strategy and ensure a better payoff. 

3. If there is component cycle C of (7 such that min(p(C)) > min(p(C'|c;)), then 
the cycle segment of C can be ignored from C without affecting the payoff. 

4. Suppose we have two component cycles Ci and C2 in C such that 
min(j>(Ci)) = min(p(C2)) = min(p(C|g)), then one of the cycles can be 
ignored without affecting the payoff. 

It follows from above that if the finite-memory strategy tti is an optimal one, 
then it can be reduced to a strategy -k'^ such that if Player 2 fixes a finite-memory 
counter-optimal strategy 1^2 , then every cycle C in the finite graph obtained from 
fixing TT^ and 1:2 is also a cycle in the original game graph. Since finite-memory 
optimal strategies exist for Player 2, a correspondence of the value of the game 
and the value of the following finite-cycle forming game can be established. The 
finite-cycle forming game is played on Q and the game stops when a cycle C is 
formed, and the payoff is as follows: if min(p(C)) is even, then the payoff for 
Player 1 is the average of the weights of the C, otherwise the payoff for Player 1 
is —1. The existence of pure memoryless optimal strategy in the finite-cycle 



forming game can be obtained from the results of Bjorklund et al. [3]. This 
concludes the proof of the following theorem. 

Theorem 4. For all lexicographic mean-payoff parity games, if Player 1 has a 
finite-memory optimal strategy, then she has a memoryless optimal strategy. 

It follows from Theorem 4 that the decision whether there is a finite-memory 
optimal strategy for Player 1 is in NP. The NP procedure goes as follows: we 
guess the value vq of state sq and verify that the value at sq is no more than 
vq. We can decide in coNP whether the value at a state is at least v, for w S Q. 
Thus, we can decide in NP whether the value at state sq is no more than vq (as it 
is the complement). Then, we guess a memoryless optimal strategy for Player 1 
and verify (in polynomial time) that the value is at least vq given the strategy. 

4 Quantitative Verification and Synthesis 

We are interested in the verification and the synthesis problem for quantitative 
specifications given by a lexicographic mean-payoff (parity) automaton. In the 
following simple lemma we establish that these automata also suffice to express 
qualitative properties. 

Lemma 4. Let A ~ {G,p) be a deterministic parity automaton and let A' = 
{G',r) be a lexicographic mean-payoff automaton. We can construct a lexico- 
graphic mean-payoff parity automaton A x A' ~ (G x G',r,p), where G x G' is 
the product graph of G and G' such that for any word w and associated run p, 
LMPAxA'ip) = —1 if the run of w is lost in A, and LMa'{p') otherwise, where 
p' is the projection of p on G' . 

Note that {La, La') — LaxA' + 1, assuming that inf^jg^'i^ La'(w) = 0. If A is a 
safety automaton, the language {La,La') can be presented by a lexicographic 
mean-payoff automaton (see Example 4). Thus, lexicographic mean-payoff au- 
tomata suffice to express both a quantitative aspect and a safety aspect of a 
specification. Lexicographic mean-payoff parity automata can be used to in- 
troduce a quantitative aspect to liveness specifications and thus to the usual 
linear-time temporal logics. 

Example 4. Let us resume Example 1. Fig. 5 shows a safety automaton B for 
the specification G(7' g\/Xg). It also shows the mean-payoff automaton C for 
{LbtLa^). (See Fig. 2 for the definition of A2.) 

4.1 Quantitative Verification 

We now consider the verification problem for quantitative specifications. For 
qualitative specifications, the verification problem is whether an implementa- 
tion satisfies the specification for all inputs. For quantitative specifications, the 
problem generalizes to the question if an implementation can achieve a given 
value independent of the inputs. 

Let A = {{S, Sq, E),r,p) be a lexicographic mean-payoff parity automaton 
and let M = {Q,qo-,S) be a Mealy machine. The quantitative verification prob- 
lem is to determine La{M). The corresponding decision problem is whether 
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Fig. 5. Safety automaton B for G(r —> g\/Xg) and automaton C for {Lb,La2)- 

La{M) > c for a given cutoff value c. Clearly, verification of qualitative lan- 
guages is a special case in which the cutoff value is 1. 

Theorem 5. The value La{M) can be computed in time 0{\S\ ■ \Q\ ■ \E\ ■ \d\ ■ 



Proof. We reduce the lexicographic mean-payoff parity automata to a mean- 
payoff parity automaton A' using the reduction stated in Section 3.2 and build 
the product automaton of A' and M. Then, we check if it contains a cycle that 
is not accepted by the parity algorithm [18]. If so, we return —1. If not, in the 
second step we find the minimal mean- weight cycle [17]. 

Example 5. In Example 1, we computed the values of Implementations Mi, 
M2, and Afs (Fig. 1) for the specifications Ai and A2 given in Fig. 2. Specifica- 
tion Ai requires the number of grants to be minimal. Under this specification, 
M3 is preferable to both other implementations because it only produces half 
as much grants in the worst case. Unfortunately, Ai treats a grant the same 
way regardless of whether a request occurred. Thus, this specification does not 
distinguish between Mi and M2. Specification A2 only punishes "unnecessary" 
grants, which means that A2 prefers M2 and M3 to Mi. 

A preference between the eagerness of M2 and the laziness of M3 can be 
resolved in either direction. For instance, if we combine the two quantitative 
languages using addition, lazy implementations are preferred. 

4.2 Quantitative Synthesis 

In this section, we show how to automatically construct an implementation from 
a quantitative specification given by a lexicographic mean-payoff (parity) au- 
tomaton. First, we show the connection between automata and games, and 
between strategies and Mealy machines, so that we can use the theory from 
Sections 3 to perform synthesis. Then, we define different notions of synthesis 
and give their complexity bounds. 

We will show the polynomial conversions of an automata to a game and of a 
strategy to a Mealy machines using an example. 

Example 6. Fig. 6(left) shows the game G corresponding to the automaton C 
shown in Fig. 5. Note: The alphabet 2'^'' has been split into an input alphabet 2' 
controlled by Player 2 (squares) and an output alphabet 2*^ controlled by Player 1 



d-\g{\Q\-\6\-\r\)). 




Fig. 6. A game (optimal strategy shown in bold) and corresponding Mealy machine 

(circles). Accordingly, each edge e of C is split into two edges 62 and ei; the 
reward of 62 is zero and the reward of 62 is double the reward of e. It should be 
clear that with the appropriate mapping between runs, the payoff remains the 
same. Because we want a Mealy machine, the input player makes the first move. 

The figure also shows an optimal strategy (bold edges) for Q with payoff 2. 
The right side of the figure shows the Mealy machine Af corresponding to the 
strategy. It is constructed by a straightforward collection of inputs and chosen 
outputs. It is easily verified that Lc{M) = 2. 

Definition 1. Let L be a quantitative language and let c G R'' be a cutoff value. 
We say that L is c-realizable if there is a Mealy machine M such that L(M) > c. 
We say that L is limit-c-realizable if for all e > there is a Mealy machine M 
such that L{M) + e > c. 

Suppose the supremum of L(M) over all Mealy machines M exists, and de- 
note it by c* . We call L realizable (limit-realizable) if L is c* -realizable (limit- 
ed-realizable). A Mealy machine M with value L{M) > c* (L{M) + £ > c* ) is 
called optimal ( e-optimal, resp.). 

Clearly, realizability implies limit-realizability. Note that by the definition of 
supremum, L is limit-c*-realizable iff c* is defined. Note also that realizability 
for qualitative languages corresponds to realizability with cutoff 1. Synthesis is 
the process of constructing an optimal (e-optimal) Mealy machine. Note that 
for a cutoff value c, if L is c-realizable, then we have that L{M) > c for any 
optimal Mealy machine M. If L is limit-c-realizable, then L{M^) + e > c holds 
for any £-optimal Mealy machine M^. 

Example 7. We have already seen an example of a realizable specification ex- 
pressed as a mean-payoff automaton (See Figs. 2 and 5 and Example 4.) Exam- 
ple 3 shows a language that is only limit-realizable. 

For the combination of safety and quantitative specifications, we have Theorem 6. 

Theorem 6. Let A = {{S, sq, E),r) be a lexicographic mean-payoff automaton 
of dimension d, and let c be a cutoff value. The following assertions hold. 

1. La is realizable (hence limit-realizable); La is c-realizable iff La is limit-c- 
realizable. 

2. c-realizability (and by (1) limit-c-realizability) of La are decidable in NP Pi 
coNP. 



3. An optimal Mealy machine can be constructed in time 0(|i?|^'^+® • |r|). 

The first results follow from the existence of mcmorylcss optimal strategies for 
lexicographic mean-payoff games. The second and third results follows from the 
complexity and algorithms of solving these games. (Sec Theorem 1.) For livcness, 
we have the following result. 

Theorem 7. Let A = {{S, sq, E), r,p) be a lexicographic mean-payoff parity au- 
tomaton of dimension d and let c be a cutoff value. The following assertions 
hold. 

1. La is limit-realizable, but it may not be realizable; limit-c-realizability of La 
does not imply c-realizability. 

2. Realizability and c-realizability of La are decidable in NP, and limit-c- 
realizability of La is decidable in coNP. 

3. For e > 0, an e-optimal Mealy machine can be constructed in time 0(15*1'^' • 
£'|4d+6 . _|_ If La is realizable, then an optimal Mealy machine can be 
constructed in time 0(|S'|IpI • |£'|'*''+6 • |r|). 

Explanation: Following Theorem 4, realizability and c-realizability can be com- 
puted in NP. Wc have that La is limit-c-realizablc iff c is not higher than the 
value of the initial state, which can be decided in coNP. (Theorem 2.) Limit- 
realizability follows from Theorem 3. 

Example 8. In Example 3 we discussed the specification ip = G{r ^ g)- In 
combination with the quantitative language given by A3 in Fig. 3, this specifica- 
tion is optimally realizable by a finite implementation: implementations Mi and 
M2 from Fig. 1 are two examples. The combination of f and the quantitative 
language given by Ai in Fig. 2 only yields a specification that is optimally limit- 
realizable. Automaton Ai prefers as few as possible requests. An implementation 
that is optimal within 1/fc could simply give a request every k cycles. It may not 
be useful to require that something happens as infrequently as possible in the 
context of liveness specifications. Instead, more subtle approaches are necessary; 
in this case we could require that unnecessary grants occur as little as possible. 
(Cf. A2 in Fig. 2.) 

5 Conclusions and Future Work 

We introduced a measure for the "goodness" of an implementation by adding 
quantitative objectives to a qualitative specification. Our quantitative objec- 
tives are mean-payoff objectives, which are combined lexicographically. Mean- 
payoff objectives are relatively standard and, as we demonstrated, sufficiently 
expressive for our purposes. Other choices, such as discounted objectives [11], 
are possible as well. These give rise to different expressive powers for specification 
languages [7]. 

Finally, wc have taken the worst-case view that the quantitative value of an 
implementation is the worst reward of all runs that the implementation may pro- 
duce. There are several alternatives. For instance, one could take the average- 
case view of assigning to an implementation some expected value of the cost 



taken over all possible runs, perhaps relative to a given input distribution. An- 
other option may be to compute admissible strategies. It can be shown that 
such strategies do not exist for all mean-payofF games, but they may exist for an 
interesting subset of these games. 
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